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Abstract 

The TCP window size process appears in the modehng of the famous Transmission 
Control Protocol used for data transmission over the Internet. This continuous time 
Markov process takes its values in [0, oo), is ergodic and irreversible. The sample paths 
are piecewise linear deterministic and the whole randomness of the dynamics comes 
from the jump mechanism. The aim of the present paper is to provide quantitative 
estimates for the exponential convergence to equilibrium, in terms of the total variation 
and Wasserstein distances. 
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1 Introduction and main results 

The TCP protocol is one of the main data transmission protocols of the Internet. It 
has been designed to adapt to the various traffic conditions of the actual network. For 
a connection, the maximum number of packets that can be sent at each round is given 
by a variable W, called the congestion window size. If all the W packets are successfully 



transmitted, then W is increased by 1, otherwise it is divided by 2 (detection of a con- 
gestion) . As shown in |DGR02l IGRZ041 IOKM96j , a correct scaling of this process leads 
to a continuous time Markov process, called the TCP window size process. This process 
X = {Xt)f^Q has [0, oo) as state space and its infinitesimal generator is given, for any 
smooth function / : [0, oo) — )• M, by 

Lfix) = nx) + x{f{x/2)-fix)). (1) 

The semi-group (i^t)j^o associated to {Xt)^^Q is classically defined by 

Ptf{x)=E{f{Xt)\Xo = x), 

for any smooth function /. Moreover, for any probability measure on [0,oo), i^Pt stands 
for the law of Xt when Xq is distributed according to v. 

The process Xt increases linearly between jump times that occur with a state-dependent 
rate Xt. The first jump time of X starting at x ^ has the law of \/2E + — x where 
E is a random variable with exponential law of parameter 1. In other words, the law of 
this jump time has a density with respect to the Lebesgue measure on (0, +oo) given by 

: tG(0,+oo)^(x + t)e-*'/2--*, (2) 

see [CMPlOj for further details. 

The sample paths of X are deterministic between jumps, the jumps are multiplicative, 
and the whole randomness of the dynamics relies on the jump mechanism. Of course, 
the randomness of X may also come from a random initial value. The process (X()^>q 
appears as an Additive Increase Multiplicative Decrease process (AIMD), but also as a very 
special Piecewise Deterministic Markov Process (PDMP). In this direction, [MZ09j gives 
a generalization of the scaling procedure to interpret various PDMPs as limits of discrete 
time Markov chains. In the real world (Internet), the AIMD mechanism allows a good 
compromise between the minimization of network congestion time and the maximization 
of mean throughput. One could consider more general processes (introducing a random 
multiplicative factor or modifying the jump rate) but their study is essentially the same 
than the one of the present process. 

The TCP window size process X is ergodic and admits a unique invariant law ^, as 
can be checked using a suitable Lyapunov function (for instance V{x) = 1 + x, see e.g. 
IBCG081 [HDOSI [MT93 , HMllJ for the Meyn-Tweedie-Foster-Lyapunov technique). It can 
also be shown that has a density on (0, +oo) given by 



(-l)"22'^ n2n-1 2 

(this explicit formula is derived in |DGR02l Prop. 10], see also |GRZ04l IMZ091 IMZ06L 
IGK09j for further details). In particular, one can notice that the density of fi has a Gaus- 
sian tail at +00 and that all its derivatives are null at the origin. Nevertheless, this process 
is irreversible since time reversed sample paths are not sample paths and it has infinite 
support (see |LP11] for the description of the reversed process). In |RR96| . explicit bounds 
for the exponential rate of convergence to equilibrium in total variation distance are pro- 
vided for generic Markov processes in terms of a suitable Lyapunov function but these 
estimates are very poor even for classical examples as the Ornstein-Uhlenbeck process. 
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They can be improved following [ RTOOj if the process under study is stochastically mono- 
tone that is if its semi-group (i^t)t^o such that x i— >• Ptf{x) is nondecreasing as soon as 
/ is nondecreasing. Unfortunately, due to the fact that the jump rate is an nondecreasing 
function of the position, the TCP process is not stochastically monotone. Moreover, we 
will see that our coupling provides better estimates for the example studied in |RT00j . 

The work [CMPlOj was a first attempt to use the specific dynamics of the TCP process 
to get explicit rates of convergence of the law of Xt to the invariant measure ^. The answer 
was partial and a bit disappointing since the authors did not succeed in proving explicit 
exponential rates. 

Our aim in the present paper is to go one step further providing exponential rate of 
convergence for several classical distances: Wasserstein distance of order p ^ 1 and total 
variation distance. Let us recall briefly some definitions. 

Definition 1.1. If and D are two probability measures on M, we will call a coupling of 
v and z> any probability measure on M x M such that the two marginals are v and P. Let 
us denote by r(i/, D) the set of all the couplings of and P. 

Definition 1.2. For every p ^ 1, the Wasserstein distance Wp of order p between two 
probability measures u and 9 on M with finite p^^ moment is defined by 

Wp{u,i))=( inf [ \x-y\PU{dx,dy)] \ (4) 
Vner(i/,£;) J^2 J 

Definition 1.3. The total variation distance between two probability measures and P 
on M is given by 

11^^ — = inf / l.{^j,,,\Il(dx, dy). 

It is well known that, for any p ^ 1, the convergence in Wasserstein distance of order p 
is equivalent to weak convergence together with convergence of all moments up to order p, 
see e.g. |Rac9H IVilOS] . A sequence of probability measures {i^n)n^i bounded in which 
converges to v in total variation norm converges also for the Wp metrics. The converse is 
false: if = (5i/„ then (fn)„^i converges to 6o for the distance Wp whereas \\i^n — (^oIItv 
is equal to 1 for any n ^ 1. 

Any coupling {X, X) of {u, u) provides an upper bound for these distances. One can 
find in |Lin92] a lot of efficient ways to construct smart couplings in many cases. In the 
present work, we essentially use the coupling that was introduced in ('Ml' 10 . Firstly, we 
improve the estimate for its rate of convergence in Wasserstein distances from polynomial 
to exponential bounds. 

Theorem 1.4. Let us define 

M = ^ ~ 0.84 and A = ^2(1 - \/M) ~ 0.12. (5) 

8 

For any \ < \, any p ^ I and any to > 0, there is a constant C = C{p,\,to) such that, 
for any initial probability measures v and v and any t ^ to, 



Wp{uPt,9Pt) ^ Cexp 
Secondly, we introduce a modified coupling to get total variation estimates. 




3 



Theorem 1.5. For any A < A and any > Q, there exists C such that, for any initial 
probability measures v and D and any t ^ to, 



\uPt - vPtW-Tv ^ Cexp [ 



where A is given by ([s]). 



Remark 1.6. In both Theorems 1.4 and 1.5, no assumption is required on the moments 
nor regularity of the initial measures. Note however that following Remark 3.4, one can 
obtain contraction's type bounds when the initial measures v and v have initial moments 
of sufficient orders. In particular they hold uniformly over the Dirac measures. If i) is 
chosen to be the invariant measure /i, these theorems provide exponential convergence to 
equilibrium. 

The remainder of the paper is organized as follows. We derive in Section [2] precise 
upper bounds for the moments of the invariant measure ^ and the law of Xt- Section [3] and 
Section [4] are respectively devoted to the proofs of Theorem 1.4 and Theorem |1. 5 [ Unlike 



the classical approach "a la" Meyn-Tweedie, our total variation estimate is obtained by 
applying a Wasserstein coupling for most of the time, then trying to couple the two paths 
in one attempt. This idea is then adapted to others processes: Section [5] deals with two 
simpler PDMPs already studied in [PROSL ILRM K 'MPlVi\ iRTHn] and Section |6] is dedicated 
to diffusion processes. 



2 Moment estimates 

The aim of this section is to provide accurate bounds for the moments of Xt. In particular, 
we establish below that any moment of Xt is bounded uniformly over the initial value Xq. 
Let p > and ap{t) = Then one has by direct computation 

a'p{t) =pap.i{t) - (1 - 2'P) ap+i{t). (6) 

2.1 Moments of the invariant measure 

Equation ([g]) implies in particular that, if rup denotes the p-th moment of the invariant 
measure // of the TCP process {nip = J xP^{dx)), then for any p > 

It gives all even moments of /i: m2 = 2, m4 = ... and all the odd moments in terms 
of the mean. Nevertheless, the mean itself cannot be explicitly determined. Applying the 
same technique to logX^, one gets the relation log(2)mi = With Jensen's inequality, 

this implies that l/vlog2 ^ mi ^ \/2. 

2.2 Uniform bounds for the moments at finite times 

The fact that the jump rate goes to infinity at infinity gives bounds on the moments at 
any positive time that are uniform over the initial distribution. 
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Lemma 2.1. For any p ^ 1 and t > 



Mp^t :=supE,(Xf) ^ V2p + 



Proof. One deduces from ([6]) and Jensen's inequality that 

a'j,{t) ^ papitf-^'-P - (1 - 2-^)ap{tf+^'^. 
Let ^p(t) = ap{tY/P - y/2p. Then, using the fact that 1 - 2"^ ^ \, 



= ^ap(t)VP-ia;(t) ^ 1 - ^apitf/P = -.F-Pp{t) - 
^ p ' 2p \l P 2p 

In particular, /3p(i) < as soon as Pp{t) > 0. 

Let us now fix t > 0. If f3p{t) ^ 0, then ap{t) ^ (2p)^^^ and the lemma is proven. 
We assume now that /3p{t) > 0. By the previous remark, this implies that the function 
s I—)- (3p{s) is strictly decreasing, hence positive, on the interval [0,t\. Consequently, for 
any s £ [0, t], 

Integrating this last inequality gives 

1 1 t t 

Jpit) ^ ^ + 2p ^ 2p' 

hence the lemma. □ 



Let us derive from Lemma 2.1 some upper bounds for the right tails of 6xPt and ^u. 
Corollary 2.2. For any t > and r ^ 2e(l + 1/t), one has 

P.(X,>.)«exp(-^^r). (7) 

Moreover, if X is distributed according to the invariant measure fi then, for any r ^ V2e, 
one has 



P(X ^ r) ^ exp y-J^j- 

Proof. Let t > and a = 2(1 + 1/t). Notice that, for any p ^ 1, E^(Xf) is smaller than 
[apY. As a consequence, for any p ^ 1 and r ^ 0, 

¥x{Xt ^ r) ^ exp{plog{ap/r)). 

Assuming that r ^ ea, we let p = r/{ea) to get: 

For the invariant measure, the upper bound is better: E{XP) ^ {2p)P/^. Then, the Markov 
inequality provides that, for any p ^ 1, 

F(X ^ r) ^ exp ( plog ^ 
\ r 

As above, if ^ 2e, one can choose p = r'^/ (2e) to get the desired bound. □ 

Remark 2.3. A better deviation bound should be expected from the expression ^ of 
the density of fi. Indeed, one can get a sharp result (see |CMP10] ). However, in the sequel 
we only need the deviation bound ([T]). 
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3 Exponential convergence in Wasserstein distance 



This section is devoted to the proof of Theorem |1.4[ We use the couphng introduced 
in [CMPlOj . Let us briefly recall the construction and the dynamics of this stochastic 
process on whose marginals are two TCP processes. It is defined by the following 
generator 

y) = {d, + dy)f{x, y) + y{f{x/2, y/2) - f{x, y)) + {x - y) {fix/2, y) - f{x, y)) 

when X ^ y and symmetric expression for x < y. We will call the dynamical coupling 
defined by this generator the Wasserstein coupling of the TCP process (see Figure [l]for a 
graphical illustration of this coupling). This coupling is the only one such that the lower 
component never jumps alone. Let us give the pathwise interpretation of this coupling. 
Between two jump times, the two coordinates increase linearly with rate 1. Moreover, two 
"jump processes" are simultaneously in action: 

1. with a rate equal to the minimum of the two coordinates, they jump (i.e. they are 
divided by 2) simultaneously, 

2. with a rate equal to the distance between the two coordinates (which is constant 
between two jump times), the bigger one jumps alone. 



8- 
6- 






4- 

1- 









Figure 1: Two trajectories following the Wasserstein coupling; the bigger jumping alone 
can be good, making the distance between both trajectories smaller, or bad. 

The existence of non-simultaneous jumps implies that the generator does not act in a good 
way on functions directly associated to Wasserstein distances. To see this, let us define 
Vp(a;, y) = |x — When we compute 2^Vp, the term coming from the deterministic drift 
disappears (since Vp is unchanged under the flow), and we get for x/2 ^ y ^ x: 

£Vp{x, y) = -y{l - 2-P)Vp{x, y) + [x - y){{y - x/2Y - {x - yf). 

For example, choosing p = \ gives: 

£yi(x,y) = -Vi{x, y){y/2 - {y - x/2) + (x - y)) = -{3/2)Viix, y)\ 

This shows that E[|Xt — decreases, but only gives a polynomial bound: the problem 
comes from the region where x — y \s already very small. 
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Choosing p = 2, we get 

^V2{x,y) = -(3/4)yy2(x,y) + (x-y)(-(3/4)x2 + xy). 

The effect of the jumps of X is even worse: if x = 1 and x — y is smaU, 2y2{x, y) is positive 
(l/4)(x-y)). 

For p = 1/2, the situation is in fact more favorable: indeed, for < y ^ 

-CVi/2(x,y) = -y\\- — jVi/2(x,y) + (x - y)(A/|y - 2;/2| -yjx-y) 



with 



V3(n) 



1-99 



^2 

X - — y - V(x-7/)(i/-x/2) 



+ V(1-u)|m-1/2| for nG[0, 1]. 



By a direct computation, one gets that 



M := max '~p{u) = if 



9 + V3\ \/2(3 + \/3) 



12 



0.8365. 



Hence, when < y ^ x, 



^yi/2{x,y) ^ -xAVi/2(x,y), 



(8) 



with A = 1 — M ~ 0.1635. This would give an exponential decay for V112 if x was bounded 
below: the problem comes from the region where x is small and the process does not jump. 

To overcome this problem, we replace ^4/2 with the function 

V{x, y) = ip{x Vy)\x - = tpix V y)Vi/2{x, y), 

where 

J 1 + a(l -x/xo)^, X ^ xo, 

[1, X ^ Xo, 

the two positive parameters a and xq being chosen below. The negative slope of ip for 
X small will be able to compensate the fact that the bound from ([s]) tends to with x, 
hence to give a uniform bound. Indeed, for < y ^ x, 

£V{x, y) = V''(x)(x - yf + yV'(x/2)fc-?^ + (x - y)^(x/2 V y)|x/2 - y\P - xV{x, y) 



= -y{x,y) 

^ -F(x,y) 
^ -F(x,y) 



2P 



ipix] 
ip'jx) 
V'(x) 



l + a, 



V'(x) 2P 
_ /y 
V'(x) Vx 



+ x(l-(l + a)M) 



^p{x) 



+ x 1 



2a 



-Vix,y) LS^^(T+z^) 
-V{x,y)x{l- (l + a)M) 



+ x (1 - (1 + a)M 



2a 



when X ^ Xq, 
when X ^ Xq. 
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Finally, as soon as (1 + a)M < 1, one has 

y) ^ -\a,xQV{x, y) yx,y>0 

with 

,xo(l-(l + a)M) . 



A. 



mm 



xo(l + a) ' 

By direct computations, one gets that the best possible choice of parameters a and xo 
is a = 1/y/M — 1 ~ 0.0934 and xq = \/2. We obtain finally, for any x,y > 0, 



ilVix,y) ^ -XV{x,y), 
with A = Xa,xo = \/2(l - ^/M) ~ 0.1208. Hence, directly from for any x,y > 



E,,,y[V{Xt,Yt)]^e-^'V{x,y). 
Immediate manipulations lead to the following estimate. 
Proposition 3.1. Let 



~ 0.84 and A = V2{1 - VM) ~ 0.12. 



Then, for any x,y > 0, one has 



e-At|^_y|i/2, 



Moreover, for any initial conditions {Xq,Yq) and any t ^ tQ > 0, 



M 



(9) 



(10) 



(11) 



(12) 



(13) 



Proof. Equation (12) is a straightforward consequence of Equation ^ since 'i/;(M+) is the 
interval [1, 1 + a]. As a consequence, for any t ^ to > 0; one has 



Then, Lemma 2.1 provides the estimate (13). 



□ 



Remark 3.2. The upper bound obtained in Proposition 3.1 is compared graphically to 
the "true" function t ^ E^^y{\Xt - Yt\''/'^) in Figure 2l By linear regression on this data, 
one gets that the exponential speed of convergence of this function is on the order of 0.4. 
Note also that this method can be adapted to any Vp with < p < 1, giving even better 
(but less explicit) speed of convergence for some p ^- we estimated numerically that 
the best value for A would be approximately 0.1326, obtained for p close to 2/3. 



We may now deduce from Proposition 3.1 estimates for the Wasserstein distance be- 
tween the laws of Xt and Yt. 
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Figure 2: The "true" function Ex^y{\Xt - Yt\^/^) (solid, black; by Monte Carlo method 
with m = 10 000 copies) and the bound given in Equation (12) (dashed, blue), for x = 2 
and y = 10. 



Theorem 3.3. Let p ^ 1. Then, for any to > and any 9 € (0,1), there exists 
C{p,tQ,9) < +00 such that, for any initial conditions {Xq,Yq) and for all t ^ to, 

E{\Xt - Yt\P) ^ Cip, to, d) exp {-\et) 



where A is defined by (11) 



Proof of Theorem 3.3. Let p ^ 1. For any < 6 < 1, Holder's inequality gives, for any 

]K{\Xt-Yt\P)i^ E(^\Xt-Yt\^^ [E{\Xt-Yt\^/^ 

Thanks to Lemma 2.1 and the inequality (a + ^ 2'^^^{a'^ + b'^), when q ^ I, one gets 

2p-e 



E IX. -yj 2(1-9) 



^ ( 22(1-9) M2£-e 



2(1-9) 



p-e/2 



i-e {i-e)t 



Then, it suffices to use Equation ( 13 ) to conclude the proof with 

\ P-e/2 



Cip, to, 9) 



2P 



2p-9 2p 
1-9 ^ 



1 - 0)to 



9 \ e/2 

2 + -) e^^*« 
to J 



□ 



Since Wp{C{Xt) , C{Yt)y ^ lEd-'^t — Ytf), Theorem 1.4 is a direct consequence of this 
result. 

Remark 3.4. Let us remark that we can obtain "contraction's type bounds" using Equa- 



tion (|12|) instead of (|13|) : for any p ^ I, any < < 1, any t ^ and any x,y > 0, 

,9/2 



p-e/2 



^-xet 1^ _ y| 



We then obtain that if v and v have finite 0/2- moments then for p ^ 1 and t ^ 0, 



1 



2 



which stih allows a control by some Wasserstein "distance" (in fact, this is not a distance, 
since 9/2 < 1) of the initial measures. 

Remark 3.5. We estimated numerically the exponential speed of convergence: 

• of the function t i— )• E,x^y{\Xt — Ytl) for the Wasserstein coupling (by Monte Carlo 
method and linear regression). It seems to be on the order of 0.5 (we obtained 0.48 
for X = 2, y = 10, m = 10 000 copies, and linear regression between times 2 and 10); 

• of the Wasserstein distance t i— )• Wi{6xPt, SyPt), using the explicit representation of 
this distance for measures on M to approximate it by the L^-distance between the 
two empirical quantile functions. It is on the order of 1.6 (we get 1.67 for x = 2, 
y = 0.5, m = 1 000 000 copies, and linear regression on 20 points until time 4). 



In conclusion, our bound from Theorem 3.3 seems reasonable (at least when compared 



to those given by |RR96] . see section 4.2 below), but is still off by a factor of 4 from 
the true exponential speed of convergence. Since the coupling itself seems to converge 
approximately 3 times slower than the true Wasserstein distance, one probably needs to 
find another coupling to get better bounds. 

Remark 3.6. Let us end this section by advertising on a parallel work by B. Cloez |Clol2j 
who uses a completely different approach, based on a particular Feynman-Kac formulation, 
to get some related results. 

4 Exponential convergence in total variation distance 



In this section, we provide the proof of Theorem 1.5 and we compare our estimate to the 
ones that can be deduced from |RR96] . 

4.1 From Wasserstein to total variation estimate 

The fact that the evolution is partially deterministic makes the study of convergence in 
total variation quite subtle. Indeed, the law 6xPt can be written as a mixture of a Dirac 
mass at x + t (the first jump time has not occured) and an absolutely continuous measure 
(the process jumped at some random time in (0, t)): 

5xPt = Pt{x)5x+t + (1 - Pt{x))C{Xt\XQ = x,Ti^ t) 

where, according to Equation Q, 

Mx):=Fx{Ti>t) = e-''/^-^\ (14) 

This implies that the map y i— )• — (5jyPi||rpy is not continuous at point x since one 

has, for any y ^ x, 



\5xPt - <Jy^t|lTV ^ VPt(y) = e 



t^/2-(xAy)t 
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Nevertheless, one can hope that 

The lemma below makes this intuition more precise. 

Lemma 4.1. Lett ^ e ^ x — y > 0. There exists a coupling ((Xt)j>Q, (1^)^>q) of two TCP 
processes driven by ([T]) and starting at {x,y) such that the probability F{Xs = Kj, s ^ t) 
is larger than 



qtix,y) = (^j ifxis) A fyis - X + y)) ds^ px-y(^ 



X + 1 



(15) 



where fx is defined in Equation ([2]) and a{x) := J^e "^/^ du. 
Moreover, for any xq > and e > 0, let us define 

Axo,e = {{x,y) : ^ y ^ xq, \x - y\ e}. (16) 

Then, 

r / \ f 1 + i \ _/2 /n / 

mf qt{x,y) ^ exp -e — e - e ' - V27re. 

Proof. The idea is to construct an explicit coalescent coupling starting from x and y. The 
main difficulty comes from the fact that the jump are deterministic. Assume for simplicity 
that y < X. Let us denote by (r^)^^.^-^ and {T^)^^^ the jump times of the two processes. If 

Tf = Ty + x-y and r| -Tf^x-y (17) 

then the two paths are at the same place at time T^ since in this case 

The law of Tf has a density fx given by ([2]). As a consequence, the density of T}+ x — y 
is given hy s ^ fy{s — x + y). The best way of obtaining the first equality in (17) before 
a time t ^ x — y is to realize an optimal coupling of the two continuous laws of Tf and 
Tf + X — y. This makes these random variables equal with probability 

I{x,y,t)=j {fx{s) f\ fy{s - x + y))ds. 

Jx-y 

Assume now that O^x — y^e^t. For any s ^ x — y, one has 
fyis-x + y) = {2y-x + s)exp(^-^{s-x + yf -y{s-x + y)^ 

= {s + X — 2{x — y)) exp (^~~^ — xs + (2s + x — 3{x — y)/2){x — y) 
^ (s + X — 2e) exp ( — xs 
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As a consequence, ifO^x — y^e^t, 

I{x,y,t)^ J (s + x — 2e) exp ^— ^ — xsj ds 

^ Ps{x) - pt{x) - 2ea{x) 

where pt{x) is defined in (14) and a{x) = f^e~'^^^'^~'^^ du. 

Finally, one has to get a lower bound for the probability of the set {Tj — ^ x — y}. 
For this, we notice that 

z G [0, +00) ^ P(rf ^ s)= psiz) = e-"'/2-*^ 

is decreasing and that Yrpv ^ {y + t)/2 as soon as ^t. As a consequence, 

inf P(rf ^s)^ p.,^y ^pj""^^' 



This provides the bound (15). The uniform lower bound on the set Ax^^^e is a direct 



consequence of the previous one. 



□ 



Let us now turn to the proof of Theorem 1.5 



Proof of Theorem 1.5. We are looking for an upper bound for the total variation distance 
between 5xPt and 5yPt for two arbitrary initial conditions x and y. To this end, let us 
consider the following coupling: during a time ti < t we use the Wasserstein coupling. 
Then during a time t2 = t — ti we try to stick the two paths using Lemma |4.1[ Let e > 
and xo > be as in Lemma 4.1 If, after the time ti, one has 



where A^^xq is defined by (16), then the coalescent coupling will work with a probability 
greater than the one provided by Lemma 4.1 As a consequence, the coalescent coupling 
occurs before time ti + 12 with a probability greater than 



inf qt2ix,y). 



{x,y)eAe,xo 

Moreover, 

F{{Xt„Yt,) i ^ P(Xti ^ xo) +P(rti ^ xo) + P(|Xti -y^J ^ e) 

^From the deviation bound ([7|), we get that, for any xq ^ 2e(l + 1/ti), 

P(Xti ^ xo) ^ e""(*i)^'o with a(tx) 



2e{ti + 1) 



The estimate of Proposition 3.1 concerning the Wasserstein coupling ensures that 



C 



t,-Yt,\>e)i^^e~^'' with C 



M 



for any < to ^ ^i- As a consequence, the total variation distance between 6xPt and 6yPt 
is smaller than 



D -.= 1 



^2 33:0 +f 2 
e - 2 



^' + e~*2/2 + ^2^£ + 2e" 



■a(il)a;'0 



-Ati 
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In order to get a more tractable estimate, let us assume that t2 ^ xq and use that 
1 — e"" ^ u to get 

D ^ + 2exo + e-^^^ + V2^e + 2e-"(*o)^« + -^e"^*! . 



Finally let us set 

t2 = \/21og(l/e), = 7^ log(l/e), = log(l/e). 

Obviously, for e small enough, xq ^ max(t2, 2e(l + and ti ^ to- Then, one gets that 

2 

aitoY 



\6xPt,+t2 -'^?;^ii+t2llTV ^ 77^elog(l/e) + (3 + C7 + V27r + e)e. 



One can now express e as a function of ti to get that there exists K = K^to) > such 
that 

bmce t2 = V(4A/3)ti, one gets that 



This provides the conclusion of Theorem |1.5| when the initial measures are Dirac masses. 
The generalisation is straightforward. □ 

4.2 A bound via small sets 

We describe here briefly the approach of [RR96] and compare it with the hybrid Wasser- 
stein/total variation coupling described above. The idea is once more to build a successful 
coupling between two copies X and Y of the process. In algorithmic terms, the approach 
is the following: 

• let X and Y evolve independently until they both reach a given set C, 

• once they are in C, try to stick them together, 

• repeat until the previous step is successful. 

To control the time to come back to C x C, [RR96j advocates an approach via a Lyapunov 
function. The second step works with positive probability if the set is "pseudo-small" , i.e. 
if one can find a time t*, an a > and probability measures I'xy 

Vx,yGC^, C{Xt*\Xo = x) ^ au^y and C{Xt*\Xo = y) ^ ai^xy. (18) 

The convergence result can be stated as follows. 

Theorem 4.2 ( |RR96j . Theorem 3, Corollary 4 and Theorem 8). Suppose that there 
exists a set C, a function V ^ 1, and positive constants A, A such that 

LV -XV + Alc. (19) 



^In fact, in [RR96| . the result is given with A instead of A' in the upper bound. Joaquin Fontbona 
pointed out to us that Lemma 6 from |RR96) has to be corrected, adding the exponential term e''* to the 
estimate. We thank him for this remark. 
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Suppose that 5 = X — A/ mf^^(;;V > 0. Suppose additionally that C is pseudo-small, with 
constant a. 

Then for A = j + e~^** sup^g^^ V , A' = Ae^^* , and for any r < 1/t*, 
\\C{Xt) - ^IItv ^ (1 - + e-^(*-**)(A')L^*J-^Ey(Xo). 

If A' is finite, this gives exponential convergence: just choose r small enough so that 
{A'Y^e~^^ decreases exponentially fast. 

To compute explicit bounds we have to make choices for C and V and estimate the 
corresponding value of a. Our best efforts for the case of the TCP process only give decay 
rates of the order 10~^^. We believe this order cannot be substantially improved even by 
fine-tuning C and V. 

5 Two other models 

This section is devoted to the study of two simple PDMPs. The first one is a simplified 
version of the TCP process where the jump rate is assumed to be constant and equal 
to A. It has been studied with different approaches: PDE techniques (see |PR051 lLP09j ) 
or probabilistic tools (see [LlHsl lOKOSl KMPlOj l The second one is studied in [RTnOj . 
It can also be checked that our method gives sharp bounds for the speed of convergence 
to equilibrium of the PDMP which appears in the study of a penalized bandit algorithm 
(see Lemma 7 in |LP08j ). 

5.1 The TCP model with constant jump rate 

In this section we investigate the long time behavior of the TCP process with constant 
jump rate given by its infinitesimal generator: 

Lf{x) = fix) + A(/(x/2) - f{x)) {x ^ 0). 

The jump times of this process are the ones of a homogeneous Poisson process with inten- 
sity A. The convergence in Wasserstein distance is obvious. 

Lemma 5.1 ((PROU KMPlOj l. For any p^l, 

WpiS^Pt, 6yPt) ^\x- y|e-V wtth Ap = ^^"^ ~ ^ (20) 

Remark 5.2. The case p = 1 is obtained in |PR05j by PDEs estimates using the following 
alternative formulation of the Wasserstein distance on M. If the cumulative distribution 
functions of the two probability measures v and P are F and F then 

Wiiu,i)) = [ \F{x) - F{x)\dx. 
Jr 

The general case p ^ 1 is obvious from the probabilistic point of view: choosing the 
same Poisson process {Nt)^yQ to drive the two processes provides that the two coordinates 
jump simultaneously and 

\Xt-Yt\ = \x-y\2-^K 

As a consequence, since the law of Nt is the Poisson distribution with parameter Xt, one 
has 

E,,y{\Xt - Ytf) = \x- yfE{2-P^^) = \x - yfe-P^^K 
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This coupling turns out to be sharp. Indeed, one can compute exphcitly the moments 
of Xt (see |LL08l [QK08] ): for every n ^ 0, every x ^ 0, and every i ^ 0, 

'■'-K—L m=l ^ fc=0 ]=k ■' 

where = A(l — 2^") = nA„ for any n ^ 1. Obviously, assuming for example that x > y, 



Wn{6^Pt,SyPtr ^ E^iiXtD-EyiiYt) 

/ ^ k k 



fc=0 j=k "-^ 



As a consequence, the rate of convergence in Equation (20) is optimal for any n ^ 1. 

Nevertheless this estimate for the Wasserstein rate of convergence does not provide 
on its own any information about the total variation distance between 6xPt and SyPt- It 
turns out that this rate of convergence is the one of the Wi distance. This is established 
by Theorem 1.1 in [PR05j . It can be reformulated in our setting as follows. 

Theorem 5.3 ( |PR05j ). Let /i he the invariant measure of X. For any measure v with a 
finite first moment and t ^ 0, 

\\uPt - /xIItv ^ e~^^''^{?,\Wi{v, fi) + \\u - /i||Tv)- 
Let us provide here an improvement of this result by a probabilistic argument. 
Proposition 5.4. For any x,y ^ and t ^ 0, 

\\5,Pt - SyPtWr^y ^ Ae-^*/2|^ -y\ + e'^K (22) 
As a consequence, for any measure u with a finite first moment and t ^ 0, 

\\uPt-fi\\^y ^ Ae~^*/2-[^i(z^,^) + e~^*||z^-/i||Tv (23) 



Remark 5.5. Note that the upper bound obtained in Equation (22) is non-null even for 
X = y. This is due to the persistence of a Dirac mass at any time, which implies that taking 
y arbitrarily close to x for initial conditions does not make the total variation distance 
arbitrarily small, even for large times. 



Proof of Proposition 5.4- The coupling is a slight modification of the one used to control 
Wasserstein distance. The paths of {Xs)q^^^^ and {Ys)^^^^^ starting respectively from x 
and y are determined by their jump times {T^)^^q and {Tn)^^Q up to time t. These 
sequences have the same distribution than the jump times of a Poisson process with 
intensity A. 

Let {Nt)^^Q be a Poisson process with intensity A and (T„)„^q its jump times with the 
convention Tq = 0. Let us now construct the jump times of X and Y. Both processes 
make exactly Nt jumps before time t. If Nt = 0, then 

Xg = X + s and Ys = y + s for ^ s ^ t. 

Assume now that Nt ^ 1. The Nf — l first jump times of X and Y are the ones of (Nt)^^^: 
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In other words, the couphng used to control Wassertein distance (see Lemma 5.1) acts 
until the penultimate jump time T/Vt-i- At that time, we have 



x-y 



2Nt-i ■ 

Then we have to define the last jump time for each process. If they are such that 



+ Xt, 



JVt-1 



Nt-1 



then the paths of X and Y are equal on the interval {T^^ , t) and can be chosen to be equal 
for any time larger than t. 

Recall that conditionally on the event {Nt = 1}, the law of Ti is the uniform distribu- 
tion on (0,t). More generally, if n ^ 2, conditionally on the set {Nt = n}, the law of the 
penultimate jump time T„_i has a density s i— )• n(n — l)t~"^{t — s)s"~^l(o and condi- 
tionally on the event {Nt = n, T„_i = s}, the law of T„ is uniform on the interval (s, t). 

Conditionally on A'^t = n ^ 1 and T„_i, and are uniformly distributed on 
(T„_i, t) and can be chosen such that 



X — y 
n ^ 2"-! 



N 



X 



\x - y\ 



2-i(t-T„-i) 



Nl = n, 



VO > 1 



1 — ^n-l 

\x - y\ 



2^-\t-Tn-l] 



This coupling provides that 



5^Pt-5yPt\\^y ^ 1-E 



^e-^* + |x-y|E 



\x — y\ 
2-Nt+i 



it - Tn, 



-1 



{A^i^l} 



For any n ^ 2, 



E 



1 



t - Tat, 



Nt 



n 



n{n — 1) 



u 



n-2 



du 



n 



This equality also holds for n = 1. Thus we get that 



E 



2-Nt+i 



-1 



1. 



it-TN,-i) J t 

since Nt is distributed according to the Poisson law with parameter Xt. This provides the 
estimate (22). 



To treat the case of general initial conditions and to get (23), we combine the coupling 
between the dynamics constructed above with the choice of the coupling of the initial 
measures /i and as a function of the underlying Poisson process {Nt)^^^: the time horizon 
t > being fixed, if Nt = 0, one chooses for C{Xq, Yq) the optimal total variation coupling 
of and n; if iVj ^ 1, one chooses their optimal Wasserstein coupling. One checks easily 
that this gives an admissible coupling, in the sense that its first (resp. second) marginal 
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is a constant rate TCP process with initial distribution v (resp. /u). And one gets with 
this construction, using the same estimates as above in the case where Nt ^ 1: 



FiXt / Yt) = FiXt / Yt, Nt^l)+ FiXt / Yt, Nt = 0) 

2-Nt+i 



^Ei \Xo-Yo\j^^—^l{N,^i}] +F{Xo^Yo,Nt = 0) 



which clearly implies (23). □ 



5.2 A storage model example 

In [RTOO] , Roberts and Tweedie improve the approach from |RR96| via Lyapunov functions 
and minorization conditions in the specific case of stochastically monotonous processes. 
They get better results on the speed of convergence to equilibrium in this case. They give 
the following example of a storage model as a good illustration of the efficiency of their 
method. The process {Xt)f^Q on M"*" is driven by the generator 

;>oo 

Lf{x) = -I3xf'ix) + a / (fix + y)- f{x))e-y dy. 

Jo 

In words, the current stock Xf decreases exponentially at rate (3, and increases at random 
exponential times by a random (exponential) amount. Let us introduce a Poisson process 
{Nt)^^Q with intensity a and jump times (?i)^^Q (with Tq = 0) and a sequence (Ei)-^^ of 
independent random variables with law £{1) independent of {Nt)^^Q. The process {Xt)f^Q 
starting from x ^ can be constructed as follows: for any i ^ 0, 



Xt 



g-/3m+i-T,)^^^ + E,+i if t = Ti+i. 



Proposition 5.6. For any x,y ^ and t ^ 0, 

Wp{d,Pt,5yPt) ^ \x-y\e-^\ 

and (when a ^ (3) 



-Pt _ -at 

\%Pt - SyPtW^.^ ^ e-"* + \x- y\a . (24) 

Moreover, if ^ is the invariant measure of the process X, we have for any probability 
measure v with a finite first moment and t ^ 0, 



a-/3 ' 



Remark 5.7. In the case a = P, the upper bound (24) becomes 

\\6^Pt - dyPtWrj^y ^ (1 + |x - y|at)e-"*. 
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Remark 5.8 (Optimality). Applying L to the test function f[x) = x" allows us to 
compute recursively the moments of Xt. In particular, 



This relation ensures that the rate of convergence for the Wasserstein distance is sharp. 
Moreover, the coupling of total variation distance requires at least one jump. As a con- 



sequence, the exponential rate of convergence is greater than a. Thus, Equation (24) 
provides the optimal rate of convergence a A f3. 

Remark 5.9 (Comparison with previous work). By way of comparison, the original 
method of [RTOO] does not seem to give these optimal rates. The case a = 1 and /3 = 2 
is treated in this paper (as an illustration of Theorem 5.1), with explicit choices for the 
various parameters needed in this method. With these choices, in order to get the con- 
vergence rate, one first needs to compute the quantity 6 (defined in Theorem 3.1), which 
turns out to be approximately 5.92. The result that applies is therefore the first part of 
Theorem 4.1 (Equation (27)), and the convergence rate is given by f3 defined by Equation 
(22). The computation gives the approximate value 0.05, which is off by a factor 20 from 
the optimal value a A (3 = 1. 



Proof of Proposition 5.6. Firstly, consider two processes X and Y starting respectively at 
X and y and driven by the same randomness (i.e. Poisson process and jumps). Then the 
distance between Xt and Yt is deterministic: 

Xt-Yt = {x-y)e-^'. 

Obviously, for any p ^ 1 and t ^ 0, 

Wp{d.,Pt,5yPt) ^ |x-y|e-^*. 



Let us now construct explicitly a coupling at time t to get the upper bound (24) for the 
total variation distance. The jump times of (Xf)^>Q and (Yt)^yQ are the ones of a Poisson 
process {Nt)^^Q with intensity a and jump times (7i)j^o- construct the jump 

heights (-f'i''" )i<j<jvj and {EJ)^^^^^^ of X and Y until time t. If Nt = 0, no jump occurs. 
If ^ 1, we choose = EJ for 1 ^ i ^ A^ — 1 and E-^^ and E^^ in order to maximise 
the probability 



This maximal probability of coupling is equal to 

exp (-1 At^^ - Yt^j) = exp (-|x - yle"^^^*) ^ 1 - \x - yje"^^^* . 



As a consequence, we get that 



6^Pt - 5yPt\\T.y ^ 1-E 



(l - \x - y\e-^^^^y{Nt>i} 



The law of T„ conditionally on the event {Nt = n} has the density 
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This ensures that 



Since the law of Nt is the Poisson distribution with parameter At, one has 
This ensures that 



I3t _ p-at 



e ' — e 



which completes the proof. Finally, to get the last estimate, we proceed as follows: if Nt 
is equal to 0, a coupling in total variation of the initial measures is done, otherwise, we 
use the coupling above (the method is exactly the same as for the equivalent result in 
Proposition 5.4, see its proof for details). □ 



6 The case of diffusion processes 

Let us consider the process {Xt)^^Q on solution of 

dXt = A{Xt)dt + cj{Xt)dBt, (25) 

where {Bt)^^^ is a standard Brownian motion on M", o" is a smooth function from 
to Md,n{^) and A is a smooth function from to M*^. Let us denote by {Pt)t^o 

the 

semi-group associated to {Xt)^^Q. If is a probability measure on M'^, uPt stands for the 
law of Xt when the law of Xq is z^. 

Under ergodicity assumptions, we are interested in getting quantitative rates of con- 
vergence of C{Xt) to its invariant measure in terms of classical distances (Wasserstein 
distances, total variation distance, relative entropy,. . . ). Remark that if A is not in gradi- 
ent form (even if a is constant), Xt is not reversible and the invariant measure is usually 
unknown, so that it is quite difficult to use functional inequalities such as Poincare or 
logarithmic Sobolev to get a quantitative rate of convergence in total variation or Wasser- 
stein distance (using for example Pinsker's inequality or more generally transportation- 
information inequality). Therefore the only general tool seems to be Meyn-Tweedie's 
approach, via small sets and Lyapunov functions, as explained in Section 4.2. However, 
we have seen that in practical examples the resulting estimate can be quite poor. 

The main goal of this short section is to recall the known results establishing the decay 
in Wasserstein distance and then to propose a strategy to derive control in total variation 
distance. 



6.1 Decay in Wasserstein distance 

The coupling approach to estimate the decay in Wasserstein distance was recently put 
forward, see |CGM08] and [BGMIO] or [Ebellj . It is robust enough to deal with nonlinear 
diffusions or hypoelliptic ones. In [BGG12], the authors approach the problem directly, 
by differentiating the Wasserstein distance along the flow of the SDE. 

Let us gather some of the results in these papers in the following statement. 
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Proposition 6.1. 1. Assume that there exists A > such that for p > 1 



^^{a{x) - a{y)){a{x) - a{y)Y + {A[x) - A[y)) -{x-y)^ -X\x - y\\ x,y e M^. 

(26) 

Then, for any u, v' G VpiW^), one has 



Wp{uPu v'Pt) ^ e~^'Wp{v, v'). (27) 

2. Assume that for all x, (j{x) = aid for some constant a and that A is equal to — Vf/ 
where U is a function such that Hess([/) ^ —Kl^ outside a ball i?(0,ro) and 
Hess(f7) ^ pld inside this ball for some positive p. Then there exists c > 1,q > 
such that 

WiiuPt, v'Pt) ^ ce-'^'Wiiv, v'). (28) 

3. Suppose that the diffusion coefficient cr is constant. Assume that A is equal to —VU 
where U a C"^ convex function such that Hess([/) ^ pl^ outside a ball B{0,rQ), with 
p > 0. Then there exists an invariant probability measure v^o and a > such that 

W2{vPt, i/oo) ^ e-'''W2{v, i^oo). (29) 

Proof. The first point is usually proved using a trivial coupling (and it readily extends to 
p = 1 in the case of constant diffusion coefficient), namely considering the same Brownian 
motion for two different solutions of the SDE starting with different initial measures. 



Note also that, in the case p = 2, the coercivity condition (26) is equivalent to the uniform 



contraction property (27) for the W2 metric (see |vRS05] ). 

The second point is due in this form to Eberle |Ebellj . using reflection coupling as 
presented by |LR86| , used originally in this form by Chen and Wang to prove spectral gap 
estimates (see a nice short proof in |HSVlll Prop. 2.8]) in the reversible case. 

Finally, the third part was proved by |BGG12) establishing the dissipation of the 
Wasserstein distance by an adequate functional inequality. □ 

Remark 6.2. • Let us remark that the contraction property of the two first points 
ensures the existence of an invariant probability measure z^oo and an exponential rate 
of convergence towards this invariant measure in Wasserstein distance. 

• Note also that in the hypoelliptic case of a kinetic Fokker-Planck equation: 

{dxt = vtdt 
dvt = dBt - F{xt)dt - V{vt)dt 

with F{x) ~ ax and V{v) ~ bv for positive a and 6, one has 

W2{vPui^'Pt) < ce-'''W2{y,y') 

for c > 1 and a positive a (see |CGM08j ). 
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6.2 Total variation estimate 



If S = in Equation (25), the process {Xt)^yQ is deterministic and its invariant measure 
is a Dirac mass at the unique point x G M*^ such that A{x) = 0. As a consequence, for any 

\\SxPt - SxWty = H {6xPt\Sx) = +00. 

A non-zero variance is needed to get a convergence estimate in total variation distance. 
Classically, the Brownian motion creates regularity and density. There are a lot of results 
giving regularity, in terms of initial points, of semigroup in small time. Let us quote the 
following result of Wang, which holds for processes living on a manifold. 

Lemma 6.3 ( |WanlO] ). Suppose that a is constant and denote by rj the infimum of its 
spectrum. If A is a function such that 

1 

2* 

then there exists K„ such that, for small e > 0, 



-(Jac^ + JacA^) ^ Kid 



I ^ — y\ 



Remark 6.4. 



• There are many proofs leading to this kind of results, see for example Aronson |Aro67| 
for pioneering works, and [WanlOj using Harnack's and Pinsker's inequalities. 

• Note that in |GW12j . an equivalent bound was given for the kinetic Fokker-Planck 
equation but with e replaced by e^. 

Now that we have a decay in Wasserstein distance and a control on the total variation 
distance after a small time, we can use the same idea as for the TCP process. As a 
consequence, we get the following result. 



Theorem 6.5. Assume that a is constant. Under Points 1. or 2. of Proposition 6.1 
has, for any v and v in V\{^^), 



one 



Ke^"^ 

\uPt - z>Pt||Tv ^ —i^Wi{v, u)e 



-At 



Under Point 3. of Proposition 6.1, one has, for any v and v in V2Q 



Proof. Using first Lemma 6.3 and then Point 1. of Proposition 6.1, we get 



\uPt - i>Pt\\r^y = W^Pt^sPe - j^Pt^ePeWTV 

< ^Wi{uPt-e.m^e) 

< ^WMD)e-^K 



The proof of the second assertion is similar, except the use of Point 3. of Proposition 6.1 



in the second step. □ 
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Remark 6.6. Once again, one can give in the case of kinetic Fokker-Planck equation es- 
timate in total variation distance, using the previous remarks (see |BCG08j for quahtative 
results). 
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